library(easypackages)
libraries("tidyverse","fpp3", "patchwork","plotly")06 Resumen Regresión Lineal
Estos modelos se basan en encontrar relaciones lineales entre la serie a pronosticar y una o más series distintas. Por lo tanto, pronosticamos los valores de una serie, a partir de los cambios en otra serie que la afecte.
Ejemplo: La venta de helados tiene una relación con la temperatura, las ventas de Nike tiene relación con cuanto gastan en publicidad y mercadotecnia.
Por lo tanto, tenemos una variable de pronóstico y variables predictoras.
Modelo lineal
Modelo de regresión lineal simple:
\[y_t = \beta_0 + \beta_1x_t + e_t\]
donde:
- \(\beta_0\) es intercepto y representa el valor predicho cuando x = 0.
- \(\beta_1\) es la pendiente de la recta. Nos indica el cambio promedio en y, ante un cambio en una unidad de x.
- \(e_t\) es el error, se asume aleatorio y captura todos los cambios de otras variables que también pueden afectar a \(y_t\) pero no se especifican.
Por lo tanto:
Ejemplo
Tasas de crecimiento del gasto de consumo (y) y su relación con el ingreso personal disponible (x)
us_change %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Consumption, colour = "Consumo")) +
geom_line(aes(y = Income, colour = "Ingreso")) +
ylab("cambio %") + xlab("Año") +
guides(colour=guide_legend(title="Series")) +
theme(legend.position = "top")Diagrama de dispersión entre ambas series:
us_change %>%
ggplot(aes(x=Income, y=Consumption)) +
ylab("Consumo (cambio % trimestral)") +
xlab("Ingreso (cambio % trimestral)") +
geom_point() +
geom_smooth(method="lm", se=FALSE)`geom_smooth()` using formula = 'y ~ x'
La línea sigue la ecuación de la regresión lineal.
Significado
El análisis de regresión en tiempos modernos trata sobre la relación de la dependencia entre una variable y, respecto de una o más variables exógenas (regresoras x) para predecir el valor promedio de la variable dependiente.
Al hablar de funciones lineales, podemos referirnos a las variables x o a los parámetros \(\beta\). En el caso de un modelo de regresión lineal, solo nos interesa la linealidad en los parámetros.
Por lo tanto, las funciones de y pueden ser curvas o cualquier forma (exponencial cuadrática, cúbica) y seguirán siendo lineales en los parámetros y se pueden estimar con un modelo de regresión lineal.
Entonces, un modelo de regresión lineal puede generar una recta, curva dependiendo de la forma funcional.
Ejemplos
US % change
\[y_{consumo} = \beta_0 + \beta_1x_{income} +\beta_2x_{production}+\beta_3x_{savings}+\beta_4x_{unemployment} + e_t\]
us_change# A tsibble: 198 x 6 [1Q]
Quarter Consumption Income Production Savings Unemployment
<qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1970 Q1 0.619 1.04 -2.45 5.30 0.9
2 1970 Q2 0.452 1.23 -0.551 7.79 0.5
3 1970 Q3 0.873 1.59 -0.359 7.40 0.5
4 1970 Q4 -0.272 -0.240 -2.19 1.17 0.700
5 1971 Q1 1.90 1.98 1.91 3.54 -0.100
6 1971 Q2 0.915 1.45 0.902 5.87 -0.100
7 1971 Q3 0.794 0.521 0.308 -0.406 0.100
8 1971 Q4 1.65 1.16 2.29 -1.49 0
9 1972 Q1 1.31 0.457 4.15 -4.29 -0.200
10 1972 Q2 1.89 1.03 1.89 -4.69 -0.100
# … with 188 more rows
us_change %>%
as_tibble() %>%
select(-Quarter) %>%
GGally::ggpairs()Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
us_change %>%
pivot_longer(cols = -Quarter) %>%
ggplot(aes(x = Quarter, y = value, color = name)) +
geom_line() +
facet_wrap(~ name, scales = "free_y") +
theme(legend.position = "none")Comparando la variable de pronóstico (Consumo) vs las demás variables predictoras:
us_change %>%
pivot_longer(cols = -c(Quarter, Consumption)) %>%
ggplot(aes(x = Quarter, y = value, color = name)) +
geom_line() +
geom_line(aes(y = Consumption), color = "black") +
facet_wrap(~ name, scales = "free_y") +
theme(legend.position = "none")Regresión lineal simple
Aplicando un primer modelo, usando la variable predictora del ingreso disponible.
fit1 <- us_change %>%
model(reg_lin_simple = TSLM(Consumption ~ Income)
)
fit1 %>% report()Series: Consumption
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-2.58236 -0.27777 0.01862 0.32330 1.42229
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.54454 0.05403 10.079 < 2e-16 ***
Income 0.27183 0.04673 5.817 2.4e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5905 on 196 degrees of freedom
Multiple R-squared: 0.1472, Adjusted R-squared: 0.1429
F-statistic: 33.84 on 1 and 196 DF, p-value: 2.4022e-08
augment(fit1) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Consumption, color = "Datos")) +
geom_line(aes(y = .fitted, color = "Fitted"))+
xlab("Año") + ylab(NULL) +
ggtitle("Cambios porcentuales en el gasto de Consumo en EEUU") +
guides(color = guide_legend(title = NULL))No parece ajustarse bien.
augment(fit1) %>%
ggplot(aes(x = Consumption, y = .fitted)) +
geom_point() +
ylab("Fitted (valores ajustados)") +
xlab("Datos (reales históricos)") +
ggtitle("Cambios porcentuales en el gasto de Consumo en EEUU") +
geom_abline(intercept = 0, slope = 1)fit1 %>%
gg_tsresiduals()augment(fit1) %>%
features(.resid, ljung_box, lag= 10, dof = 2)# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 reg_lin_simple 35.1 0.0000259
Se puede mejorar bastante.
Regresión lineal múltiple
\[y_{consumo} = \beta_0 + \beta_1x_{income} +\beta_2x_{production}+\beta_3x_{savings}+\beta_4x_{unemployment} + e_t\]
fit2 <- us_change %>%
model(
reg_lin_multiple = TSLM(Consumption ~ Income + Production + Savings + Unemployment)
)
report(fit2)Series: Consumption
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-0.90555 -0.15821 -0.03608 0.13618 1.15471
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.253105 0.034470 7.343 5.71e-12 ***
Income 0.740583 0.040115 18.461 < 2e-16 ***
Production 0.047173 0.023142 2.038 0.0429 *
Savings -0.052890 0.002924 -18.088 < 2e-16 ***
Unemployment -0.174685 0.095511 -1.829 0.0689 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3102 on 193 degrees of freedom
Multiple R-squared: 0.7683, Adjusted R-squared: 0.7635
F-statistic: 160 on 4 and 193 DF, p-value: < 2.22e-16
augment(fit2) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Consumption, color = "Datos")) +
geom_line(aes(y = .fitted, color = "Fitted"))+
xlab("Año") + ylab(NULL) +
ggtitle("Cambios porcentuales en el gasto de Consumo en EEUU") +
guides(color = guide_legend(title = NULL))augment(fit2) %>%
ggplot(aes(x = Consumption, y = .fitted)) +
geom_point() +
ylab("Fitted (valores ajustados)") +
xlab("Datos (reales históricos)") +
ggtitle("Cambios porcentuales en el gasto de Consumo en EEUU") +
geom_abline(intercept = 0, slope = 1)fit2 %>%
gg_tsresiduals()augment(fit2) %>%
features(.resid, ljung_box, lag= 10, dof = 2)# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 reg_lin_multiple 18.9 0.0156
df <- left_join(us_change, residuals(fit2), by = "Quarter")
df %>%
select(-c(Consumption, .model)) %>%
pivot_longer(cols = c(Income:Unemployment)) %>%
ggplot(aes( x = value, y = .resid, color = name)) +
geom_point() + ylab("Residuales") + xlab("Predictoras") +
facet_wrap(~ name, scales = "free_x") +
theme(legend.position = "none")augment(fit2) %>%
ggplot(aes(x = .fitted, y = .resid)) +
geom_point() +
labs(x = "Ajustados", y = "Residuales")glance(fit2) %>%
select(adj_r_squared, AIC, AICc, BIC)# A tibble: 1 × 4
adj_r_squared AIC AICc BIC
<dbl> <dbl> <dbl> <dbl>
1 0.763 -457. -456. -437.
Selección de predictoras
Escoger subconjunto de predictoras y probarlo
fit3 <- us_change %>%
model(r1 = TSLM(Consumption ~ Income),
r2 = TSLM(Consumption ~ Income + Production),
r3 = TSLM(Consumption ~ Income + Production + Savings + Unemployment),
r4 = TSLM(Consumption ~ Income + Production + Savings),
r5 = TSLM(Consumption ~ Income + Savings + Unemployment),
r6 = TSLM(Consumption ~ Income + Production + Unemployment),
r7 = TSLM(Consumption ~ Income + Savings)
)
fit3 %>%
glance() %>%
select(.model, adj_r_squared, AIC, AICc, BIC)# A tibble: 7 × 5
.model adj_r_squared AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl>
1 r1 0.143 -205. -204. -195.
2 r2 0.336 -254. -254. -241.
3 r3 0.763 -457. -456. -437.
4 r4 0.761 -455. -455. -439.
5 r5 0.760 -454. -454. -438.
6 r6 0.366 -262. -262. -246.
7 r7 0.735 -436. -436. -423.
fit3 %>%
select(r3) %>%
report()Series: Consumption
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-0.90555 -0.15821 -0.03608 0.13618 1.15471
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.253105 0.034470 7.343 5.71e-12 ***
Income 0.740583 0.040115 18.461 < 2e-16 ***
Production 0.047173 0.023142 2.038 0.0429 *
Savings -0.052890 0.002924 -18.088 < 2e-16 ***
Unemployment -0.174685 0.095511 -1.829 0.0689 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3102 on 193 degrees of freedom
Multiple R-squared: 0.7683, Adjusted R-squared: 0.7635
F-statistic: 160 on 4 and 193 DF, p-value: < 2.22e-16
Backwards stepwise regression
- Se epmieza con un modelo con todas las predictoras.
- Se quita una a la vez.
- Se mantiene el modelo dependiendo de si hay mejora o no.
us_change %>%
model(i = TSLM(Consumption ~ Income + Production + Savings + Unemployment),
ii = TSLM(Consumption ~ Income + Production + Savings),
iii = TSLM(Consumption ~ Income + Production + Unemployment),
iv = TSLM(Consumption ~ Income + Savings + Unemployment),
v = TSLM(Consumption ~ Production + Savings + Unemployment)
) %>%
glance() %>%
select(.model, adj_r_squared, AIC, AICc, BIC)# A tibble: 5 × 5
.model adj_r_squared AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl>
1 i 0.763 -457. -456. -437.
2 ii 0.761 -455. -455. -439.
3 iii 0.366 -262. -262. -246.
4 iv 0.760 -454. -454. -438.
5 v 0.349 -257. -257. -241.
Con base a AICc es mejor el que incluye todas las predictoras.
Con base a BIC es mejor el que no incluye al desempleo.
us_change %>%
model(i = TSLM(Consumption ~ Income + Production + Savings + Unemployment),
ii = TSLM(Consumption ~ Income + Production + Savings),
iii = TSLM(Consumption ~ Income + Production),
iv = TSLM(Consumption ~ Income + Savings),
v = TSLM(Consumption ~ Production + Savings),
vi = TSLM(Consumption ~ Income + Savings + Unemployment),
vii = TSLM(Consumption ~ Production + Savings + Unemployment),
viii = TSLM(Consumption ~ Income + Production + Unemployment)
) %>%
glance() %>%
select(.model, adj_r_squared, AIC, AICc, BIC)# A tibble: 8 × 5
.model adj_r_squared AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl>
1 i 0.763 -457. -456. -437.
2 ii 0.761 -455. -455. -439.
3 iii 0.336 -254. -254. -241.
4 iv 0.735 -436. -436. -423.
5 v 0.324 -251. -250. -238.
6 vi 0.760 -454. -454. -438.
7 vii 0.349 -257. -257. -241.
8 viii 0.366 -262. -262. -246.
Forwards stepwise regression
- Comenzar con un modelo simple con solo el intercepto.
- Ir añadiendo predictoras.
- Elegir solo la que más mejore el modelo.
- Iterar.
us_change %>%
model(i = TSLM(Consumption ~ Income),
ii = TSLM(Consumption ~ Production),
iii = TSLM(Consumption ~ Savings),
iv = TSLM(Consumption ~ Unemployment)
) %>%
glance() %>%
select(.model, adj_r_squared, AIC, AICc, BIC)# A tibble: 4 × 5
.model adj_r_squared AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl>
1 i 0.143 -205. -204. -195.
2 ii 0.276 -238. -238. -228.
3 iii 0.0611 -187. -186. -177.
4 iv 0.274 -237. -237. -228.
us_change %>%
model(
o = TSLM(Consumption ~ Production),
i = TSLM(Consumption ~ Production + Income),
ii = TSLM(Consumption ~ Production + Savings),
iii = TSLM(Consumption ~ Production + Unemployment)
) %>%
glance() %>%
select(.model, adj_r_squared, AIC, AICc, BIC)# A tibble: 4 × 5
.model adj_r_squared AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl>
1 o 0.276 -238. -238. -228.
2 i 0.336 -254. -254. -241.
3 ii 0.324 -251. -250. -238.
4 iii 0.308 -246. -246. -233.
us_change %>%
model(
i = TSLM(Consumption ~ Production + Income),
ii = TSLM(Consumption ~ Production + Income + Savings),
iii = TSLM(Consumption ~ Production + Income + Unemployment)
) %>%
glance() %>%
select(.model, adj_r_squared, AIC, AICc, BIC)# A tibble: 3 × 5
.model adj_r_squared AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl>
1 i 0.336 -254. -254. -241.
2 ii 0.761 -455. -455. -439.
3 iii 0.366 -262. -262. -246.
us_change %>%
model(
i = TSLM(Consumption ~ Production + Income + Savings),
ii = TSLM(Consumption ~ Production + Income + Savings + Unemployment)
) %>%
glance() %>%
select(.model, adj_r_squared, AIC, AICc, BIC)# A tibble: 2 × 5
.model adj_r_squared AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl>
1 i 0.761 -455. -455. -439.
2 ii 0.763 -457. -456. -437.
Pronóstico
Pronósticos ex-ante
Solo se utiliza información disponible hasta el último dato del histórico. Se tiene que pronosticar las predictoras antes de poder pronosticar la variable de interés.
# mod_predictoras <- function(predictora, horizonte = 4) {
# us_change %>%
# model(predictora = ARIMA(as.formula(predictora)) %>%
# forecast(h = horizonte)
# }
#
# mod_predictoras(predictora = Income)
ingreso <- us_change %>%
model(ETS = ETS(Income),
ARIMA = ARIMA(Income)
) %>%
forecast(h = 4)
ingreso %>%
autoplot(us_change, level = NULL)produccion <- us_change %>%
model(
ETS = ETS(Production),
ARIMA = ARIMA(Production)
) %>%
forecast(h = 4)
produccion %>%
autoplot(us_change, level = NULL)ahorro <- us_change %>%
model(
ETS = ETS(Savings),
ARIMA = ARIMA(Savings)
) %>%
forecast(h = 4)
ahorro %>%
autoplot(us_change, level = NULL)desempleo <- us_change %>%
model(
ETS = ETS(Unemployment),
ARIMA = ARIMA(Unemployment)
) %>%
forecast(h = 4)
desempleo %>%
autoplot(us_change, level = NULL)Teniendo los pronósticos de las predictoras, generamos el pronóstico del consumo.
fit <- us_change %>%
model(
`Regresión lineal múltiple` = TSLM(Consumption ~ Income + Production + Savings + Unemployment)
)
datos_futuros <- new_data(us_change,4) %>%
mutate(Income = ingreso %>% filter(.model == "ARIMA") %>% pull(.mean),
Savings = ahorro %>% filter(.model == "ARIMA") %>% pull(.mean),
Unemployment = desempleo %>% filter(.model == "ARIMA") %>% pull(.mean),
Production = produccion %>% filter(.model == "ARIMA") %>% pull(.mean))
datos_futuros# A tsibble: 4 x 5 [1Q]
Quarter Income Savings Unemployment Production
<qtr> <dbl> <dbl> <dbl> <dbl>
1 2019 Q3 0.813 3.34 0.00461 0.131
2 2019 Q4 0.715 0.725 -0.0427 -0.00325
3 2020 Q1 0.729 1.62 0.0248 0.565
4 2020 Q2 0.729 1.32 0.0307 0.345
fc <- forecast(fit, datos_futuros)
fc %>%
autoplot(us_change)fc %>%
autoplot(us_change %>% filter_index("2016 Q1" ~ .))Pronósticos ex-post
Se utiliza información real disponible de las predictoras. Estos pronósticos ya no son reales.
Pronósticos basados en escenarios
fit_escenarios <- us_change %>%
model(lineal = TSLM(Consumption ~ Income + Savings + Unemployment))
# Necesitamos agregar nuevos datos de las predictoras
escenarios <- scenarios(
optimista = new_data(us_change, 4) %>%
mutate(Income = c(0,0.2,0,1.2), Savings = c(0,-0.1,0.5,-1), Unemployment = -0.1),
pesimista = new_data(us_change, 4) %>%
mutate(Income = -1, Savings = -0.5, Unemployment = 0.1),
names_to = "Escenario"
)
fc_escenarios <- fit_escenarios %>%
forecast(new_data = escenarios)
us_change %>%
autoplot(Consumption) +
autolayer(fc_escenarios)Inclusión de predictoras útiles
Para la producción de cerveza:
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)
recent_production# A tsibble: 74 x 7 [1Q]
Quarter Beer Tobacco Bricks Cement Electricity Gas
<qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1992 Q1 443 5777 383 1289 38332 117
2 1992 Q2 410 5853 404 1501 39774 151
3 1992 Q3 420 6416 446 1539 42246 175
4 1992 Q4 532 5825 420 1568 38498 129
5 1993 Q1 433 5724 394 1450 39460 116
6 1993 Q2 421 6036 462 1668 41356 149
7 1993 Q3 410 6570 475 1648 42949 163
8 1993 Q4 512 5675 443 1863 40974 138
9 1994 Q1 449 5311 421 1468 40162 127
10 1994 Q2 381 5717 475 1755 41199 159
# … with 64 more rows
recent_production %>%
autoplot(Beer) +
labs(x = "Año", y = "Megalitros",
title = "Producción de cerveza trimestral en Australia")Hay predictoras que pueden ser útiles en el análisis de regresión.
- Predictora de tendencia trend()
\[y_t = \beta_0 + \beta_1t+e_t\] 2. Variables dummy estacionales
Las variables dummy toman solo dos valores: 1 o 0.
Para agregar variables dummy, solo tiene que escriibr season()
recent_production %>%
select(Quarter,Beer) %>%
mutate(tendencia = seq_along(recent_production$Quarter),
q2 = if_else(quarter(Quarter)==2,1,0),
q3 = if_else(quarter(Quarter)==3,1,0),
q4 = if_else(quarter(Quarter)==4,1,0)
) %>%
model(TSLM(Beer ~ tendencia + q2 + q3 + q4)) %>%
report()Series: Beer
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-42.9029 -7.5995 -0.4594 7.9908 21.7895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 441.80044 3.73353 118.333 < 2e-16 ***
tendencia -0.34027 0.06657 -5.111 2.73e-06 ***
q2 -34.65973 3.96832 -8.734 9.10e-13 ***
q3 -17.82164 4.02249 -4.430 3.45e-05 ***
q4 72.79641 4.02305 18.095 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.23 on 69 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16
fit_beer <- recent_production %>%
model(TSLM(Beer ~ trend() + season()))
report(fit_beer)Series: Beer
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-42.9029 -7.5995 -0.4594 7.9908 21.7895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 441.80044 3.73353 118.333 < 2e-16 ***
trend() -0.34027 0.06657 -5.111 2.73e-06 ***
season()year2 -34.65973 3.96832 -8.734 9.10e-13 ***
season()year3 -17.82164 4.02249 -4.430 3.45e-05 ***
season()year4 72.79641 4.02305 18.095 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.23 on 69 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16
p <- augment(fit_beer) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Beer, color = "Datos")) +
geom_line(aes(y = .fitted, color = "Fitted")) +
labs(x = "Año", y = "Megalitros")
ggplotly(p)Hay un trimestre que está por debajo, se puede agregar una variable de intervención para ese periodo.
Spike variables
Capturan el efecto de un solo periodo
cerveza <- recent_production %>%
select(Quarter, Beer) %>%
mutate(
q2_94 = if_else(Quarter == yearquarter("1994 Q2"),1,0),
q4_04 = if_else(Quarter == yearquarter("2004 Q4"),1,0)
)
cerveza# A tsibble: 74 x 4 [1Q]
Quarter Beer q2_94 q4_04
<qtr> <dbl> <dbl> <dbl>
1 1992 Q1 443 0 0
2 1992 Q2 410 0 0
3 1992 Q3 420 0 0
4 1992 Q4 532 0 0
5 1993 Q1 433 0 0
6 1993 Q2 421 0 0
7 1993 Q3 410 0 0
8 1993 Q4 512 0 0
9 1994 Q1 449 0 0
10 1994 Q2 381 1 0
# … with 64 more rows
fit_beer <- cerveza %>%
model(TSLM(Beer ~ trend() + season() + q2_94 + q4_04))
report(fit_beer)Series: Beer
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-27.0379 -5.8811 -0.6895 7.7209 21.7895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 441.84123 3.32189 133.009 < 2e-16 ***
trend() -0.34137 0.05975 -5.713 2.78e-07 ***
season()year2 -33.39369 3.55788 -9.386 7.83e-14 ***
season()year3 -17.82164 3.55460 -5.014 4.16e-06 ***
season()year4 75.32030 3.60790 20.876 < 2e-16 ***
q2_94 -24.03384 11.24265 -2.138 0.036190 *
q4_04 -45.41027 11.15547 -4.071 0.000126 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.81 on 67 degrees of freedom
Multiple R-squared: 0.9426, Adjusted R-squared: 0.9375
F-statistic: 183.4 on 6 and 67 DF, p-value: < 2.22e-16
p <- augment(fit_beer) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Beer, color = "Datos")) +
geom_line(aes(y = .fitted, color = "Fitted")) +
labs(x = "Año", y = "Megalitros")
ggplotly(p)Cambios de nivel
Capturan el efecto a partir de cierto periodo
cerveza <- cerveza %>%
mutate(d2000 = if_else(year(Quarter)>=2000,1,0))
cerveza# A tsibble: 74 x 5 [1Q]
Quarter Beer q2_94 q4_04 d2000
<qtr> <dbl> <dbl> <dbl> <dbl>
1 1992 Q1 443 0 0 0
2 1992 Q2 410 0 0 0
3 1992 Q3 420 0 0 0
4 1992 Q4 532 0 0 0
5 1993 Q1 433 0 0 0
6 1993 Q2 421 0 0 0
7 1993 Q3 410 0 0 0
8 1993 Q4 512 0 0 0
9 1994 Q1 449 0 0 0
10 1994 Q2 381 1 0 0
# … with 64 more rows
fit_beer <- cerveza %>%
model(TSLM(Beer ~ trend() + season() + d2000))
report(fit_beer)Series: Beer
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-43.0235 -7.5945 -0.4539 7.7754 21.4829
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 441.9154 3.8644 114.354 < 2e-16 ***
trend() -0.3548 0.1308 -2.712 0.00846 **
season()year2 -34.6452 3.9985 -8.665 1.36e-12 ***
season()year3 -17.8046 4.0536 -4.392 4.02e-05 ***
season()year4 72.8279 4.0594 17.941 < 2e-16 ***
d2000 0.7282 5.6402 0.129 0.89766
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.32 on 68 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9188
F-statistic: 166.1 on 5 and 68 DF, p-value: < 2.22e-16
p <- augment(fit_beer) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Beer, color = "Datos")) +
geom_line(aes(y = .fitted, color = "Fitted")) +
labs(x = "Año", y = "Megalitros")
ggplotly(p)cerveza %>%
gg_tsdisplay(Beer %>% difference(4), plot_type = "partial")Warning: Removed 4 rows containing missing values (`geom_line()`).
Warning: Removed 4 rows containing missing values (`geom_point()`).
Regresiones no lineales
Modelos exponenciales
Modelos log.log \[logy_t = \beta_0 + \beta_1log x_t\]
Modelos lin-log \[y_t = \beta_0 + \beta_1log x_t\]
Modelos log-lin \[logy_t = \beta_0 + \beta_1x_t\]
Modelos de regresión lineal por partes (picewise)
boston_men <- boston_marathon %>%
filter(Event == "Men's open division") %>%
mutate(Minutes = as.numeric(Time)/60)
p <- boston_men %>%
autoplot(Minutes) +
ggtitle("Tiempos ganadores del maratón de Boston, categoría abierta de hombres")
ggplotly(p)fit_boston <- boston_men %>%
model(
lineal = TSLM(Minutes ~ trend()),
)
fc_boston <- fit_boston %>% forecast(h = 10)
boston_men %>%
autoplot(Minutes) +
geom_line(aes(y = .fitted, color = .model), data = fitted(fit_boston)) +
autolayer(fc_boston, alpha = 0.5, level = 95) +
ggtitle("Maratón de Boston, cat. abierta de hombres")fit_boston <- boston_men %>%
model(
lineal = TSLM(Minutes ~ trend()),
exponencial = TSLM(log(Minutes) ~ trend()),
`Reg. por partes` = TSLM(Minutes ~ trend(knots = c(1940,1980)))
)
fc_boston <- fit_boston %>% forecast(h = 10)
boston_men %>%
autoplot(Minutes) +
geom_line(aes(y = .fitted, color = .model), data = fitted(fit_boston)) +
autolayer(fc_boston, alpha = 0.5, level = 95) +
ggtitle("Maratón de Boston, cat. abierta de hombres")fit_boston <- boston_men %>%
model(
lineal = TSLM(Minutes ~ trend()),
exponencial = TSLM(log(Minutes) ~ trend()),
`Reg. por partes` = TSLM(Minutes ~ trend(knots = c(1940,1980))),
`Reg. por partes2` = TSLM(Minutes ~ trend(knots = c(1975))),
`Reg. por partes3` = TSLM(Minutes ~ trend(knots = c(1915, 1950, 1988))),
`Reg. por partes4` = TSLM(Minutes ~ trend(knots = c(1915, 1927, 1950, 1985)))
)
fc_boston <- fit_boston %>% forecast(h = 10)
boston_men %>%
autoplot(Minutes) +
geom_line(aes(y = .fitted, color = .model), data = fitted(fit_boston)) +
autolayer(fc_boston, alpha = 0.5, level = 95) +
ggtitle("Maratón de Boston, cat. abierta de hombres")accuracy(fit_boston) %>%
arrange(MAPE)# A tibble: 6 × 11
Event .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Men's ope… Reg. … Trai… -9.24e-16 4.64 3.18 -0.0958 2.19 0.727 0.716 0.0323
2 Men's ope… Reg. … Trai… -3.46e-16 5.12 3.67 -0.116 2.53 0.838 0.789 0.196
3 Men's ope… Reg. … Trai… 9.24e-16 5.68 4.00 -0.141 2.72 0.913 0.876 0.312
4 Men's ope… Reg. … Trai… 6.93e-16 6.01 4.57 -0.164 3.15 1.04 0.928 0.402
5 Men's ope… expon… Trai… 1.26e- 1 6.10 4.76 -0.0845 3.29 1.09 0.941 0.415
6 Men's ope… lineal Trai… -4.62e-16 6.13 4.79 -0.169 3.31 1.09 0.945 0.417
Box-cox
lambda <- boston_men %>%
features(Minutes, guerrero) %>%
pull(lambda_guerrero)
p1 <- boston_men %>%
autoplot(Minutes)
p2 <- boston_men %>%
autoplot(box_cox(Minutes, lambda = lambda))
p3 <- boston_men %>%
autoplot(log(Minutes))
p1/p2/p3